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The  DNA  sequence  of  the  protective  antigen  gene  from  Bacillus  anthracia  and  the 
5'  and  3'  flanking  sequences  were  determined.  Protective  antigen  it?  one  of 
three  proteins  comprising  anthrax  toxin.  The  open  reading  frame  is  2319  base 
pairs  (bp)  long,  of  unich  2205  bp  encode  the  735  amino  acids  of  the  secreted 
protein.  This  region  is  preceded  by  29  codons,  which  appear  to  encode  a  signal 
peptide  having  character iatics  in  common  with  those  of  other  secreted  proteins. 
A  consensus  TATAAT  sequence  was  located  at  the  putative  -10  promoter  site.  A 
Shlne-Dalgamo  site  similar  to  that  found  in  genes  of  other  Bacillus  sp».'<cont) 
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was  located  seven  bp  upstream  of  the  ATG  initiation  codon.  The  codon  usage  for  the 
protective  antigen  gene  reflected  the  high  A  +  T  (69Z)  base  composition.  The  TAA  . 
translation  atop  codon  was  followed  by  an  Inverted  repeat  forming  a  potential 
termination  signal.  In  addition,  a  192-codon  open  reading  frame  of  unknown  eignif icancu , 
theoretically  encoding  a  21.6  kilodalmn  protein,  preceded  the  5*  end  of  the  protective 
antigen  gene. 
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ABSTRACT 


The  DNA  sequence  of  the  protective  antigen  gene  from 
Bacillus  anthracis  and  the  5'  and  3'  flanking  sequences  were 
determined.  Protective  antigen  is  one  of  three  proteins 
comprising  anthrax  toxin.  The  open  reading  framo  i3  2319  base 
pairs  (bp)  long,  of  which  2205  bp  encode  the  735  amino  acids  of 
the  secreted  protein.  This  region  is  preceded  by  29  codons, 
which  appear  to  encode  a  signal  peptide  having  characteristics  in 
common  with  those  of  other  secreted  proteins.  A  consensus  TATAAT 
seqi'..  \ce  was  located  at  the  putative  -10  promoter  site.  A 
Shine-Dalgarno  site  similar  to  that  found  in  genes  of  other 
Bacillus  sp.  was  located  seven  bp  upstream  of  the  ATG  initiation 
codon.  The  codon  usage  for  the  protective  antigen  gene  reflected 
the  high  A  +  T  (69%)  base  composition.  The  TAA  translation  step 
codon  was  followed  by  an  inverted  repeat  forming  a  potential 
termination  signal.  In  addition,  a  192-codon  open  reading  frame 
of  unknown  significance,  theoretically  encoding  a  21.6  kilodalton 
protein,  preceded  the  5'  end  of  the  protective  antigen  gene. 
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Bacillus  anthracis  is  an  important  pathogen  of  animals  and 
of  people  exposed  to  infected  animals  or  their  products.  It  can 
cause  cutaneous  anthrax,  gastrointestinal  anthrax,  and  an  often 
fatal  systemic  pulmonary  form  of  the  disease  (13,  21,  22)  .  The 
two  major  virulence  factors  of  B.  anthracis  are  a  poly-D-glutamic 
acid  capsule  and  "anthrax  toxin."  DNA  functions  controlling 
toxin  and  capsule  production  are  carried  on  B.  anthracis  plasmids 
pXOl  and  pX02,  respectively  (10,  31) .  The  toxin  is  composed  of 
three  separate  proteins,  protective  antigen  (PA),  edema  factor 
(EF) ,  and  ler.hal  factor  (LF)  .  The  three  proteins  are  nontoxic 
alone.  However,  PA  m  combination  with  LF  causes  death  in  rats 
(2),  and  PA  combined  with  EF  produces  edema  in  the  akin  of  guinea 
pigs  and  rabbits  (21,  22) .  In  addition  to  mediating  the  toxic 
effects  of  LF  and  EF,  protective  antigen  induces  immunity  to 
infection  and  is  the  major  component  of  the  currently  licensed 
human  vaccine  (13,  14,  21,  23,  41). 

In  order  to  understand  the  role  of  PA  in  the  pathogenesis  of 
disease  and  the  induction  of  protective  immunity,  the  DNA 
encoding  PA  has  been  cloned  and  sequenced.  All  three  of  the 
toxin  proteins  are  encoded  by  the  176-kilobase  pair  (kb)  plasmid 
pXOl  (31,  35,  42) .  Vodkin  and  Leppla  (42)  first  reported  the 
cloning  of  the  PA  gene  in  Escherichia  coli.  The  gene  was 
contained  in  a  6-kb  BarHl  fragment  of  pXOl  cloned  into  plasmid 
pBR322.  Full-size,  biologically  active  PA  was  produced.  The 
Bacillus  promoter  was  present  but  expression  of  the  gene  by  the 
recombinant  plasmid  <pSE36)  in  E.  coli  was  low. 
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In  a  recent  study,  we  subcloned  the  6-kb  insert  of  pSE36 
into  the  plasmid  vector  pUBHO  and  transformed  B.  subtilis  with 
the  recombinant  ONA  (14)  .  Two  recombinants  ware  isolated  which 
produced  large  amounts  of  full-size  PA  despite  the  presence  of 
deletions  in  the  6-Kb  insert  of  approximately  2.7  kb  and  3.4  kb, 
respectively.  In  vitro  concentrations  of  PA  produced  by  the 
recombinants  were  similar  or  greater  than  those  observed  with 
B.  anthracis  (14).  Protective  antigen,  a  protein  of  approx¬ 
imately  85  kilodaltons  (kDa)  by  sodium  dodecyl  sulfate 
polyacrylamide  gel  electrophoresis  (SDS-PAGE)  (14,  21,  42), 
requires  a  coding  region  of  2  -  2.5  kb. 

The  purpose  of  the  present  study  was  to  map  and  sequence  the 
coding  region  of  PA.  Partial  digestion  and  religation  of  plasmid 
pSE36  (which  has  the  6-kb  insert)  yielded  a  .dialler  derivative 
plasmid,  pPA26,  which  contains  a  4.2-kb  insert  encoding  full-size 
PA.  In  this  report,  the  nucleotide  sequence  of  this  insert  and 
analysis  of  the  PA  coding  region  are  presented. 
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MATERIALS  AND  METHODS 

Bacteria  and  plassids.  Isolates  of  E.  coli  K12  strain 
HB101,  transformed  with  pSE36  or  pPA26,  were  the  sources  of 
plasmid  DNA;  and  strain  JM103  (29)  was  used  to  propagate  M13 
phage  derivatives. 

Subcloning  and  detection  of  PA-producing  recombinants .  The 
isolation  of  recombinant  E.  coli  (pSE36)  has  been  described 
(42) .  Briefly,  pSE36  consists  of  plasmid  pBR322  with  a  6-kb 
Bam HI  fragrant  encoding  the  PA  protein  from  plasmid  pXCl  of 
B.  anthracis .  To  obtain  derivatives  having  smaller  insert  DNA, 
plasmid  pSE36  DNA  was  partially  digested  with  Hin dill  and 
religated.  E.  coli  strain  HB101  was  transformed  with  the  plasmid 
DNA,  and  recombinants  were  tested  for  the  presence  of  the  PA  gene 
by  immunological  assay  (42) .  The  site  and  biological  activity  cf 
PA  produced  by  the  recombinants  were  tested  by  a  Western  blot 
procedure  and  the  CHO  cell  elongation  assay,  respectively  (14, 

20,  42). 

Isolation  of  DNA.  Plasmid  pPA26  DNA  was  prepared  from 
cleared  lysates  by  ultracentrifugation  in  cesium  chloride/eth- 
idium  bromide  gradients  according  to  methods  described  by 
Maniatis  et  al  (25)  .  The  DNA  was  digested  simultaneously  with 
Nindlll  and  Bam HI,  and  the  4.2-kb  insert  encoding  PA  was  isolated 
as  a  2.2-kb  HindlTZ  and  2.0-kb  Bi/xlllI-BamSil  fragment  (Fig.  1). 
The  DNA  fragments  were  purified  by  preparative  gel 
electrophoresis,  the  bands  excised,  and  the  DNA  extracted  with 
phenol  for  cloning  in  Ml  3. 
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Nucleotide  sequence  analysis.  The  two  fragments  were  each 
cloned  into  phages  M13mplQ  and  M13mpll,  and  the  dideoxy  chain 
termination  method  (30,  30)  was  used  to  sequence  the 
DNA.  Initially,  data  were  collected  by  using  the  universal  primer 
(Pharmacia  P-L  Biochemicals,  Piscataway,  N.J.).  Using  these  data, 
we  synthesized  oligonucleotide  primers  18  nucleotides  long  to 
collect  each  additional  data  segment  (37).  The  oligonucleotides 
were  prepared  by  the  phosphoramidite  method  (Applied  Bicsystems, 
Foster  City,  Calif.).  The  products  of  the  sequencing  reactions 
were  separated  in  7%  denaturing  polyacrylamide  gels,  and  data 
read  from  the  autoradiograms  were  compiled  and  melded  by  using 
the  GEL  prc.T-.am  in  the  IntelliGenetics  Molecular  Biology  software 
package  (IntelliGenetics,  Inc.,  Mountain  View,  Calif.). 

Knsymee  and  reagent*.  Restriction  endonucleases  were 
purchased  from  International  Biotechnolgies,  Inc.  (New  Haven, 
Conn.)  and  Bethesda  Research  Laboratories  (Gaithersburg,  Md.)  and 
were  used  as  recommended  by  the  suppliers.  T4  DNA  ligase  and 
deoxynucleoside  and  dideoxynucleoside  triphosphates  were  from 
Pharmacia  P-L.  Klenow  fragment  was  purchased  from  Boehringer 
Mannheim  Biochemicals  (Indianapolis,  Ind.),  and 
a-3aP-deoxynucleoside  triphosphates  (300  -  300  curie3/mmole,  11.1 
-  29.6  TBq/mmo^e)  were  from  Amersham  (Arlington  Heights,  Ill.). 

Computer  analysis  of  DNA  sequence  and  protein  secondary 
structure.  The  sequence  in  pPA26  of  B.  anthracis  DNA  was 
analyzed  by  severs)  computer  software  packages.  The  MOLGENJR 
programs  (J.  R.  Lowe,  Fed.  Proc.  45:1582,  1986)  were  run  on  an 
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IBM-PC  microcomputer  to  confirm  experimental  restriction  enzyme 
cleavage  patterns,  deduce  open  reading  frames,  and  translate  the 
primary  DNA  sequence  into  amino  acid  sequences.  Other  programs 
from  the  MOLGENJR  package  were  used  to  examine  the  translated 
peptide  sequences,  calculate  codon  usage  and  polypeptide 
molecular  weights,  and  plot  hydropathy  and  secondary  structure 
histograms.  We  used  a  VAX  750  minicomputer.-  executing  program 
SEQ  in  the  IntelliGenetics  Molecular  Biology  software  package,  tc 
search  for  regions  of  dyad  symmetry  and  to  calculate  free 
energies  of  base  pairing  in  potential  DNA  hairpin  secondary 
structure.  Other  unpublished  programs  and  algorithms  were  used 
to  search  for  potential  activation  sequences  (ENHANCE2 .MSB)  and 
significant  open  reading  frames  ( ORl'REAL . MSB )  and  to  create 
condensed,  dot-matrix  hydropathy  and  secondary  structure 
histograms  (AGNAKDCF -MSB) .  These  are  available  from  J.R.  Lowe. 
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WESUX.TS  AND  DISCUSSION 

Cloning  cf  tha  PA  gsna  and  sequencing  strategy .  The 
Protective  Antigen  gene  of  B  anthracis  was  originally  cloned 
into  pBR322  as  a  6-kb  insert  ( 4 2 > .  Digestion  and  religation  of 
the  recombinant  plasmid  pSE36  yielded  a  smaller  plasmid  with  a 
4.2-kb  insert  (pPA26),  which  retained  the  PA  gene  (Figure  :.A)  . 

E.  coli  transformants  of  pSE36  and  pPA26  both  produced  proteins 
of  about  33  kDa  on  SDS-PAGE  which  reacted  specifically  with 
anti-PA  antibody  on  Western  blot  analysis  and  were  biologically 
active  in  the  CHO  ceil  elongation  assay  (20;  data  not  shown)  .  To 
determine  the  location  and  diraction  of  transcription  of  the  PA 
gene,  the  4.2-kb  insert  was  excised,  digested  with  ffi.ndl'tl  and 
Barrel  into  two  fragments  of  2.0-kb  and  2.2-kb,  and  sequenced  as 
indicated  in  Figure  13  and  1C. 

Nucleotide  sequence  analysis  -  PA. 

(i)  Open  reeding  frame .  The  nucleotide  sequence  of  the  PA 
protein  is  shown  in  Figure  2.  Analysis  of  the  sequence  revealed 
an  open  reading  frama  2319  base  pairs  (bp)  long.  The  structural 
geno  for  the  mature  protein  began  at  nucleotide  1891,  coding  for 
a  glutamic  acid  residue,  and  th*  translated  sequence  was  in 
agreement  with  both  the  N-terminal  amino  acid  sequence  and  the 
amino  acid  composition  determined  previously.  The  coding  region 
for  this  portion  of  the  PA  gene  was  22C5  bp  long,  encoding  a  735- 
amino  acid  protein  with  a  theoretical  molecular  weight  c £  82,684 
daltons.  The  size  of  the  mature  PA  protein  as  determined  by 
sequence  analysis  was  similar  to  that  estimated  by  SDS-PAGS 
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analysis  of  PA  from  Bacillus  culture  supernatants,  83  -  85  kDa 
(14,  21,  42) .  The  final  residue  of  the  coding  region  (glycine) 
was  followed  by  the  consensus  TAA  stop  codon  (nucleotide  4096) . 
Thus,  as  indicated  in  Figure  1C,  all  except  the  N-terminal  53 
amino  acids  were  encoded  within  the  2.0-kb  Hin61Il-Banft\I  fragment 
at  the  3'  end  of  the  4.2-kb  B.  anthracis  insert.  This  location 
of  the  gene  at  the  end  of  the  insert  confirms  the  position  of  the 
PA  gene  mapped  in  recently  isolated  3.  suhtilis  recombinants 
(14) .  In  that  study,  cloning  of  the  B .  anthracis  insert  into 
B.  subtilis  (pUBHO)  yielded  two  plasmid  recombinants  with 
deletions  at  the  5'  end  of  the  insert.  The  smaller  recombinant 
plasmid  retained  just  2,6-kb  of  DNA  at  the  S'  end  of  the  PA 
insert  but  produced  full-sized,  functional  PA  (14) . 

Preceding  the  sequence  encoding  the  83  kDa  PA  protein 
(starting  at  nucleotide  1891)  were  two  ATG  codons  in  phase  with 
the  open  reading  frame,  at  nucleotides  1834  and  1804,  Similar  to 
other  Bacillus  proteins,  PA  is  a  secreted  protein  and  is  probably 
synthesized  as  a  precursor  having  a  signal  peptide.  The 
methionine  codon  at  nucleotide  1804  appears  to  be  the  likely 
starting  point  for  translation.  It  would  initiate  a  sequence 
having  several  characteristics  in  common  with  other  Bacillus 
signal  sequences  that  have  been  identified.  The  29-residue 
peptide  that  would  be  encoded  is  typical  of  the  size  of  other 
Bacillus  signal  sequences  (3,  17,  19,  31,  44.-  46) .  Also,  the 
positively  charged,  N-terminal  five  amino  acids 
(Met-Lys-Lys-Arg-T,ys) ,  the  hydrophobic  central  region  (residues  8 
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to  21),  -.e  terminal,  alanine  residue  are  characteristic  of 

bacterial  signal  peptides  (3,  17,  19,  31,  44,  46)  . 

(ii)  Transcription  and  translation  regulatory  regions .  A 
putative  Shino-Dalgarno  riboscmal  binding  site,  indicated  in 
Figure  2,  is  located  seven  bp  upstream  of  the  ATG  codon  at 
nucleotide  1804.  The  sequence  of  this  site,  AAAC-GAG,  and  the 
distance  separating  it  from  the  initiation  codcn,  closely 
resemble  the  characteristics  of  the  Shine -Dalgarno  sites  reported 
for  several  other  Bacillus  sp  genes  (5,  26,  33,  43,  46)  .  The 
Shine-Dalgarno  sequence  has  a  calculated  binding  energy  with 
B.  subtilis  16S  rRNA  of  -14.0  kcal/mole  (1,  27,  40).  Possible 
promotor  sequences  are  underlined  in  Figure  2.  The  putative  RNA 
polymerase  recognition  site  (TATAAT)  at  nucleotide  1764  is 
identical  to  the  E.  coli  and  B.  subtilis  cr*3  -10  consensus 
sequences.  The  6-base  sequence  starting  at  nucleotide  1738,  and 
separated  by  20  bp  from  the  -10  site,  resembles  the  conserved  -35 
site  of  E.  coli  and  tha  -35  site  reported  for  genes  of  gram 
positive  organisms  (36) .  The  optimal  distance  between  the  -10 
and  -35  RNA  polymerase  recognition  regions  in  B.  anthracis  genes 
is  unknown.  In  E.  coli,  these  sequences  are  separated  by  16  to 
19  bp,  with  17  being  the  mo3t  frequent  and  resulting  in  maximal 
promoter  strength  (36) .  Bacillus  promoters,  especially  those 
recognized  by  o4:,-containing  RNA  polymerase,  are  often  similar  in 
their  sequence  and  spacing  to  E.  coli  promoters;  however  several 
different  promoter  sequences  have  been  identified  (8,  15,  43, 

44) .  Also,  distances  between  the  two  promoter  regions  as  long  as 
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21  bp  have  been  reported  for  other  sequences,  e.g.  the  pertussis 
toxin  gene  (24) .  In  vitro  and  in  vivo  transcription  analyses 
will  be  necessary  to  locate  the  precise  promoter  region  for  the 
PA  gene.  An  inverted  repeat  forming  a  potential  termination 

structure  was  located  3'  of  the  translation  stop  codon  a3  shown 
in  Figure  3A.  The  putative  hairpin  structure  contained  19 
complementary  nucleotide  pairs  and  two  T-G  mismatches  between 
nucleotides  4142  and  4188.  The  structure  had  a  3trong  predicted 
free  energy  of  base-pair  formation  (AG*  -  -22.2  kcal/mole). 

We  observed  three  additional  regions  forming  potential 
stem-and-loop  structures,  which  showed  significant  probabilities 
and  negative  free  energies  of  formation.  The  sequence  from 
nucleotides  868  to  926  (Fig.  3B) ,  was  inside  the  192-codon  open 
reading  frame  and  had  a  strong  calculated  AGr  of  -25.4 
kcal/mole.  The  second  reaion  of  dyad  symmetry  (Fig.  3C) ,  from 
nucleotides  1263  to  1346,  had  a  predicted  AG*  -  -19.6  kcal/mole. 
The  third  region  (Fig.  3D)  spanned  the  PA  promoter  from 
nucleotides  1722  to  1779  and  had  a  predicted  AG*  -  -15.8 
kcal/mole.  If  any  or  all  of  these  regions  is  recognized  as  a 
transcriptional  terminator  in  E.  coli,  their  presence  could 
possibly  explain  the  low  PA  expression  from  the  original  clones 
(5  -  10  ng  PA/ml )  (42) . 

(iii)  Base  composition  and  codon  usage.  The  base 
composition  of  the  coding  strand  of  the  PA  gene  was:  A  -  39%,  T  » 
30%  (A  +  T  -  69%  of  total),  G  -  17%,  C  -  14%  (G  +  T  -  31%) .  The 
codon  usage  is  shown  in  Table  1.  There  was  a  preference  for  A 
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and  T  at  the  third  position  in  «-he  codons,  which  might  reflect 
the  high  A  +  T  content.  The  codon  usage  was  similar  to  that  of 
another  Bacillus  gene  of  plasmid  origin,  the  crystal  protein 
toxin  gene  of  the  related  species  B.  thuringiensis  (39)  .  A  major 
difference  between  the  two  codon  profiles  was  the  lack  of 
cysteine  residues  in  PA.  The  codon  usage  in  the  PA  gene  differed 
from  that  in  genes  for  toxins  and  other  proteins  produced  by 
other  gram-positive  and  gram-negative  bacteria  (Table  1  and  data 
not  shown) . 

Analysis  of  protein  structure  frost  the  nucleotide  sequence. 

The  prediction  of  the  amino  acid  sequence  of  PA  and  the  deduction 
of  protein  structural  information  were  performed  by  algorithms  of 
the  computer  program*  described  above  (J.  R.  Lowe,  Fed.  Proc. 
45:1582,  1986  and  other  unpublished  programs).  The  algorithms 
used  to  predict  the  hydropathic  profile  and  the  protein  secondary 
structure  ara  based  on  the  methods  of  Kyte  and  Doolittle  (18)  and 
Chou  and  Fasman  (4),  respectively.  These  predictions  are  shown 
in  Figure  4.  The  ami  no- terminal  portion  of  the  putative  signal 
peptide  is  hydrophilic,  whereas  the  central  core  was  hydrophobic, 
as  expected  from  comparisons  with  similar  analyses  of  other 
proteins  with  confirmed  signal  sequences  (data  not  shown) . 

Regions  of  the  sequence  upstream  of  the  PA  gone.  Other 
opan  reading  frames,  in  addition  to  the  longest  one  of  2319  bp 
encoding  PA,  were  found  in  the  4.2-kb  sequence.  The  only  open 
reading  frame  at  least  100  codons  long  was  a  576-nucleotide 
sequence  (ORF1)  beginning  with  an  ATG  at  position  41$  upstream  of 
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the  PA  gene.  The  192-codon  open  reading  frame  encodes  a 
polypeptide  with  a  calculated  Hr  of  21,610  daltons.  The  codon 
usage  of  the  translated  region  is  similar  to  that  observed  for 
the  PA  gene  {Table  1) .  A  computer  analysis  (ORFREAL.MSB)  of  this 
open  reading  frame  according  to  the  method  of  Fickett  (7) 
calculated  a  92%  coding  probability.  A  similar  analysis  of  the 
PA  coding  region  also  gave  a  92%  coding  probability.  Potential 
-10  and  -35  RNA  polymerase  recognition  sites,  but  no  consensus 
Shir.e-Dalgarno  site,  on  the  5'  side  of  the  cryptic  open  reading 
frame  were  identified.  The  open  reading  frame  terminated  with  » 
TAG  stop  codon.  Figure  5  is  a  hydropathy  plot  and  secondary 
structure  analysis  of  the  putative  protein.  The  sequence  does 
not  appear  to  encode  a  signal  peptide  but  does  have  an 
interesting  carboxy  terminus  rich  in  hydrophobic  residues 
embedded  in  a  region  with  a  high  probability  for  fJ-sheet 
structure.  This  suggests  that  the  protein  could  be  membrane- 
bound  at  its  carboxy  terminus.  The  significance  of  this  putative 
gene  is  unknown  and  awaits  analysis  of  expression  experiments 
using  the  cloned  plasmid  DNA. 

The  availability  of  the  complete  nucleotide  sequence  of  the 
PA  of  B.  anthracis  will  serve  several  useful  purposes.  For 
example,  the  promoter  sequence  of  PA  can  be  probed  by 
promoter-proving  vectors  and  the  sequence  altered  by 
site-specific  mutagenesis.  Thus,  enhanced  production  of  cloned 
PA  in  the  B.  subtilis  or  E.  coli  hosts  will  become  feasible. 

Also,  specific  mutagenesis  of  the  PA  coding  region  could  be  done 
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to:  (1)  produce  immunogenic,  biologically  inactive  crosa-reactive 
proteins  for  vaccine  studies  (13,  23,  41);  or  (2)  examine  the 
role  of  different  domains  of  PA  on  binding  to  target  cell 
membranes  and  to  the  EF  and  LF  components  of  anthrax  toxin  (9, 

20,  32) .  Finally,  segments  of  the  PA  nucleotide  sequence  will  be 
used  as  probes  to  examine  the  genetic  organization  of  PA  in 
variant  strains  of  B.  anthracis  (23,41). 
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FIG.  1  -  Construction  of  pla3mid  containing  the  PA  gene  and 
sequencing  strategy. 

(A)  Plasmid  pPA26  was  constructed  from  pSE36  oy  partial 
Hindlll  digestion.  The  4.2-kb  HirxMlI-Baxiil  portion  of  the 
plasmid  (open  box)  contains  the  PA  gene.  The  distances  (in  kb) 
between  the  Ba'tiR i  (B)  and  Hindlll  (H)  sites  are  indicated;  EcoRl 
sites  (E)  are  included. 

(B)  To  sequence  this  insert,  pPA26  was  digested  with  BairiAI 
and  Hindlll.  The  2.2-kb  hindlll  and  2,0-kb  Hin&III-Bami 
fragments  ware  isolated  and.  cloned  into  M13  mplO  and  mpll.  (C) 
The  arrows  indicate  the  direction  and  extent  of  sequencing  of  the 
DNA  fragments,  totalling  4235-kb.  The  hatched  bar  indicates  the 
structural  gene  for  the  mature  PA  protein. 

FIG.  2  -  Nucleotide  and  amino  acid  sequence  of  the  PA  gene  and  5' 
and  3'  flanking  sequences.  The  sequence  shown  corresponds  to 
nucleotides  1  -  4235  on  the  map  in  Fig.  1.  Restriction 
endonuclease  sites  described  in  Fig.  1  and  in  the  text  are 
indicated.  The  presumptive  -35  and  -1G  sequences,  and  Shine- 
Daigarno  ribosomal  binding  site  (rbs)  of  the  PA  gene  and  of  the 
potential  192  nucleotide  open  reading  frame  aro  underlined,  as 
are  the  translation  start  (ATG)  and  stop  (TAG,  TAA)  codons. 

Arrows  above  the  nucleotide  sequence  indicate  initiation  of 
translation  of  the  potential  open  reading  frame  upstream  of  the 
PA  gene  (0RF1)  and  of  the  signal  sequence  (SIG)  and  mature 
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protein  (MAT)  of  the  PA  gene.  The  29-residue  signal  peptide  is 
underlined.  The  translated  amino  acid  sequences  of  the  192 
nucleotide  open  reading  frame  and  the  PA  gene,  only,  are  shown. 
The  potential  stem-loop  termination  structure  flanking  the  3'  end 
of  the  PA  gene,  and  the  three  palindromic  sequences  on  the  5' 
side  of  the  PA  gene  are  indicated  by  dashed  lines  between  outward 
pointing  arrowheads  above  che  sequences. 

FIG.  3  -  Possible  stem  and  loop  structures  found  in  the  upstream 
and  downstream  sequences  from  the  PA  gene  a.  d  in  the  putative 
peptide  coding  region.  Numbering  corresponds  with  that  of  Fig. 

2.  The  calculated  free  energies  of  these  conformations  were  (A) 
AG*  «  -22.2  kcal/mcle,  (B)  AG*  -  -25.4  kcal/mole,  (C)  A Ge  - 
-19.6  kcal/mole,  and  (D)  AG*  -  -15.8  kcal/mola. 

FIG.  4  -  Combination  hydropathy/secondary  structure  plot  of  PA.‘ 
The  left  margin  contains  the  amino  acid  sequence  of  PA.  Residue 
numbers  are  scaled  on  the  right  ordinate.  The  abscissa  units  are 
hydropathy  values.  Dot  positions  on  the  left  portion  of  the  plot 
indicate  the  most  probable  secondary  structure  feature  predicted. 
Headings  are  H  for  helix,  S  for  (}-sheet,  T  for  p-turn,  and  R  for 
random  coil.  Region  A  is  the  hydrophobic  signal  sequence  with 
its  highly  charged  amino  terminals.  Regions  B-J  are  potential 
antigenic  sites  in  the  sequence.  Dot-matrix  output  was  computer- 
generated  with  program  AGNAKDCF.EXE.  Algorithms  developed 
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according  to  the  schemes  of  Kyte  and  Doolittle  (’8)  and  Chou  and 
Fasman  (4)  . 

FIG.  5  -  Combination  hydropathy/secondary  structure  plot  of  the 
putative  peptide.  Plot  organization  and  generation  are  the  same 
as  described  in  Figure  4.  Region  A  is  a  long,  highly  hydrophobic 
28-residue  sequence  with  mostly  (3-sheet  structure  predicted. 
Regions  B-D  are  potential  antigenic  sites  in  the  sequence. 
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1150  ....  1200 

aicnMTiuiauuuoflui^^ 
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2150  .  .  .  Un dill  2200 

(noAGMiacTCiLTtagmamscngngCGcr^^ 

V4lLyjLysSarA^luTyrT}uPh«lI^iS4tJUxUpi3nHi»7ilSii*i»tJ^V»JJLjj:i3pGl2Glu7ilIl»JLsalysAIa£*EisaS«rijn 

2250  ...  2300 

MmwawiaiMiiniflwnTruiWMiminffurqfflmirrn^ 

Ly*Il«Axgl«uGlulytGlyArgl4vTyrtlnIl«lyiIl#GLiTyrtlair5GlulsnProTlirGluly331yLauAspPlialyjLauTyrtrpThr , 

2350  ....  2400 


AspSsrGlnJLsaLysLysGiuVilllaSarSerAspJLsnLeuGliiLsuProGlcLaulysGliiLysSarSatisnSarlrgLyjLysirgSarThxSer 

2450  ...  2500 

GCTGGKCTicGGTrcaGjnOTGiajLPaTCairoxTarT^^ 

AliGlyProThrVilPrQiaplrgispAinJLjpGlylltPraijpSarLa’jGluVilGluGlyTyrftrVilJLjpV'dLysAsnlysAzijTbrPhaLauSer 

2550  ....  2600 


■«  J  t.i.tTf*  »  *  Tre»  M  W jig  ? 


ProTzpIlaSarJUnllaMsGluLyslyjGlyLauttxLystyrlyjSarSaxProGMyrtrpSaxSizjUaSajJbpProTyrSarJLspPbftGlu 

2650  ....  2700 


1  ft  W*J! >**’•#  |  ?H  TVtfyrf  Vvl#  TfcftHet  Ire'll  »  V.fcfcf*  W 


iy*ViIThrGlyJLrgIl*JLfpLyiJ^aValSad?rt)Gluilair53isProL«u7ilAlaiLUTyrProil*V»lSijVillspitatGlulLsall«Ila 

2750  ...  2800 


t.v.t  t  J+*  Te» »  WT«n**  Jew»  |-wj  if**VlV*W  "*«&***  * - 


UuS«rLysAsaGluAspGlaSarfiirGlai3nTLEAspSatGluttflxgliixIla3«rly»i3nar5-iiarS4tirgThrSlsTlii£ai^luVaiHi3 

2850  .  .  Jk&I  .  .  2990 

AKGmiGOGXIGTGCl^GCGTCGrrcnKmmtnGGGaGTGTlSCT^^ 
GlyA»rJllaGlu7alBl3AlaSaiPtaPhaAspIliGlyGly$ar7alSarllaGlyFha$ai8anSatlaaSarSarfhryalAUIla&spHlsSar 

2950  ....  3000 


Vf4.t  '•*T\rrr+f*ri  *  *  *  ^  >  f  * \+ W*.Te*  W  Vary*  wJ 


UuGarLauJU.aGlyGluArgThrTrpAUGlttSiiHBtGlyla^JMThrllaA.tpThiAlaAfgLauLacAlaAsallaArgTyrVallaalhiGly 

3050  ....  3100 

ICGGOTOAHnTmCGTGTTmjysauCTICGraGT^^ 

ThiJU^toIlaTyrisnVilLauProThrfhtSarliauVilLaTiGlyLysAsaGliiTliiLauJLUThrllalysilalysGluisnGlnLauSerGlnlle 

3150  ....  3200 

TttTTGacCTUfflUfflTnaiccTTcrmLigmGC^^^ 

iAuUaPrQAaabnTyrTycPioSaxLysAinLauAlaPrQllaiaUuiafillaGMspLspPbaSexSatSicProIlaSuHsULsn'CyrAsn 

3250  ...  3300 

icuirrcnGMrraGAmMCGMJOiaTUQra^i^^ 

GliiPhaLauGlul4uGluLy3l%zLy8Glal4uL^U^J^pftxispGlfi7alTyzGlyisallaiLliSiiTyclariPhaGl’ii3ttGlyicg7aUrg 

3350  .  .  .  .  3400 

GTGGraauGGCTCGUnGaGSQUU3GmCtmA2«2^^ 

TalitpThzGlySaxJUsTj^SaxGlaTalUi^roGMlaGli^aThz&rila&rgllallaPlMisaGlyLysljpLauAs&LauTslGlttlxg&ig 

3450  ...  3500 


|re*  W  1  Vifew  f«  m  r*ff*i**n  i  *  r?e*J  'w**.*  1 1  i.w.ieit. 


IlaAlaiiUYaliJDProSarlzpProLaMGlatlirttirLysPreispHstlhiLaaLpGlTitlaLattlyallailaPhaGlyPhaAsnGlaProAan 
.  .  .  .  3550  ....  3600 

IGGMACTOCMllggLXGGGAAJOQlMrCGX>rr7GArrra^TTCtaJCllCU^QTCTaJL3MlICi3Ua3LrCAGTTIGCGGMraiAC 
Glyianl^lnTyrGlnGlyLyiAspIlaT^luPhaA^PbftAanPhelspGkGlaHirSarGlnJUmllalyslsaGlnLsi^IaGluLaTdan 

3650  ...  3700 


rlyrf  »  'K  W  W  Vrf  fcrfl  1 »  V»  »  > '  rye* »  W.  T*f*  ft*  W>t  VTw.H.Vrf  ywWTH 


ilattttoa  IlaTyrthzValLaalapLyillaLyiLwiiiaAlalyaatJUallaLanllairgUpLytirgPbaHiaTyilgplrgiaoisallaAla 

3750  ...  3800 

<3£m&Xa&3tofftOJmffnUGQGGCKX^^ 

ValGlyliaAapG  luS«x7al7allysGlulIaHi4JLrgGluValI  laisaSarSartlixGlaGly  LauLraLaiilaBl  lalspLy  slsp  I  laArgLy  s 
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3850  ....  3900 

AmmrcaGCTaagTraaoAgTGnam^^ 

Ilal^uS^iClyTyrllaYalGluIlaGluAapThiGlpGlyl^ulygGlnValllftAanAxpJLi^yjrJLapMotLiulialltStrScrLguAr^lj 

3950  ....  4000 

i3pGlyLysThxIh*IlgAspPlieLy8Ly8T7rAsal3pLysLau?roL«uTyxIl*5«iisnJralJm?y£Ly9Y*liiaYilTyrilaV8lthiLyaGlu 

4050  ....  4100 

AAJuoinamrauLrccTaCTtasaicGG&n^^ 

AsnThrIl*IlaAaitfroS8rtl\^nfllyJ^pThrS«r?hxIsnGlyIl#lysIyiIlaUttIl42h*S«Ly*Ly»SlytyiGluIlaGly 

.  < - 4150 - - >  .  4200 


Bid! 
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4300 
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Table  1.  •‘Aaiao  Acid  Conpcsitien  of  PI  and  Codon  Ooa gt  Ccqpa  fiscal* 


28.6  23.1 
0.0  7.7 
50.0  53.8 
21.4  15.4 


Ala 

gcd 

31.7 

12.5 

31.3 

41° 

GCC 

4.9 

0.0 

12.5 

Ga 

46.3 

75.0 

32.8 

CCS 

17.1 

12.5 

23.4 

37.5 

42.2 

42.9 

4.0 

12.5 

11.1 

0.0  ' 

'  56.0 

41.7 

20.0 

57.1 

8.0 

8.3 

26.7 

0.0 

32.0 

Ara 

COT 

10.3 

1.0 

21.3 

5.3 

35.3 

16.7 

0.0 

13.2 

0.0 

29 

CGC 

0.0 

0.0 

5.3 

0.0 

5.9 

0.0 

64.3 

0.0 

0.0 

■  4.  •  •  » J  'H 

CGA 

10.3 

50.0 

13.3 

5.3  • 

11.8 

33.3 

0.0 

4.5 

0.0 

***** 

CGG 

13.8 

0.0 

2.7 

0.0 

5.9 

16.7 

7.1 

9.1 

0.0 

AGA 

55.2 

50.0 

44.0 

57.9 

17.6 

33.3 

7.1 

54.5 

33.3 

AGG 

10.3 

0.0 

13.3 

31.6 

23.5 

0.0 

1 

,  21.4 

13.6 

66.7 

Asn 

AAD 

76.8 

71.4 

74.4 

81.7 

80.0 

71.4 

0.0 

87.5 

85.7 

69 

AAG 

23.2 

28.6 

25.6 

18.3 

20.0 

28.6 

100.0 

12.5 

14.3 

■W-v/^VV 

A5P 

GAD 

87.2 

93.8 

79.4 

92.3 

71.4 

70.8 

12.5 

62.5 

100.0 

•V^r^S-r  in 

"r 

47 

GAG 

12.8 

6.2 

20.6 

7.7 

28.6 

29.2 

87.5 

37.5 

0.0 

"-3*  •  •'  •'  '!>• 

Cvs 

DGD 

. 

64.7 

75.0 

50.0 

100.0 

0.0 

100.0 

100.0 

0 

OGC 

- 

- 

35.3 

25.0 

50.0 

0.0 

100.0 

0.0 

0.0 

Gin 

CAA 

83.9 

100.0 

81.6 

84.6 

87.5 

100.0 

30.0 

45.5 

100.0 

r  cir-1 

31 

CAG  . 

16.1 

0.0 

18.4 

15.4 

12.5 

0.0 

70.0 

54.5 

0.0 

Glu 

GAA 

74.5 

100.0 

70.7 

82.1 

59.5 

66.7 

71.4 

53.8 

85.7 

51 

GAG 

25.5 

Q.G 

29.3 

17.9 

40.5 

33.3 

28.6 

46.2 

14.3 

Glv 

GGD 

11.1 

41.7 

25.0 

35.7 

37.0 

44.4 

9.5 

35.0 

50.0 

.  -» 

36 

GGC 

5.6 

8.3 

12.5 

10.7 

13.0 

0.0 

61.9 

15.0 

0.0 

GGA 

52.8 

41.6 

45.0 

50.0 

21.7 

44.4 

14.3 

45.0 

50.0 

eurtbrv;.? 

GGG 

30.6 

8.3 

17.5 

3.6 

28.3 

11.1 

14.3 

5.0 

0.0 

Els 

CAD 

90.0 

66.7 

90.9 

85.7 

53.8 

66.7 

.  25.0 

50.0 

75.0 

10 

CAC 

10.0 

33.3 

9.1 

14.3 

41.2 

33.3 

75.0 

50.0 

25.0 

lie 

ADD 

50.9 

50.0 

56.3 

36.3 

36.1 

57.1 

13.8 

38.9 

66.7 

57 

AX 

17.5 

o.c 

23.9 

1.8 

25.0 

0.0  ‘ 

56.3  . 

0.0 

8.3 

ADA 

31.6 

50.0 

19.7 

61.4 

38.9 

42.9 

25.0 

61.1 

25.0 

■  •  ■ 

Lea 

XA 

67.7 

52.6 

45.0 

60.4 

20.0 

52.4 

0.0 

61.1 

33.3 

* 

62 

DUG 

12.9 

5.3 

11.0 

13.2 

15.0  * 

23.8 

12.5 

5.6 

11.1 

CUD 

9.7 

10.5 

22.0 

11.3 

25.0 

4.8 

8.3 

16.7 

11.1 

cx 

3.2 

5.3 

2.0 

0.0 

10.0 

4.3 

20.8 

5.6 

0.0 

COA 

4.8 

27.0 

15.0 

11.3 

20.0 

9.5 

0.0 

0.0 

33.3 

, 

- 

CX 

1.6 

5.3 

7.0 

3.8 

10.0 

4.8 

.  58.3 

il.i 

11.1 

lys 

All 

78.3 

75.0 

72.7 

92.0 

72.5 

76.5 

0.0 

83.3 

63.6 

60 

AID 

21.7 

25.0 

27.3 

8.0 

27.5 

23.5 

100.0 

16.7 

36.4 

Hit 

AX 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100,0 

1ft 


W 


r 


mmsmmmmmmm 


4‘ 
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Phe 

ouu 

79.2 

77.8 

75.9 

95.7 

78.9 

85.7 

O.C 

66.7 

100.0 

24 

DOC 

20.8 

22.2 

24.1 

4.3 

21.1 

14.3 

icc.: 

33.3 

0.0 

Pro 

ecu 

31.0 

50.0 

33.9 

41.2 

39.1 

42.9 

0.0 

7.7 

66.7 

29 

ccc 

10.3 

0.0 

5.4 

5.9 

13.0 

14.3 

18.2 

23.1 

0.0 

CCA 

37.9 

50.0 

42.9 

47.1 

26.1 

42.9 

18.2 

53.8 

33.3 

CCG 

20.7 

0.0 

17.9 

5.9 

21.7 

c.o 

63.6 

15.4 

0.0 

Set 

OCO 

30.6 

21.4 

19.8 

45.7 

27.8 

43.8 

0.0 

26.3 

22.2 

72 

OCC 

4.2 

0.0 

15.1 

2.2 

7.4 

O.C 

30.0 

15.8 

0.0 

CCA 

20.9 

35.7 

27.9 

23.9 

13.0 

12.5 

0.0 

31.6 

33.3 

CCG 

9.7 

7.1 

7.0 

2.2 

18.5 

18.8 

20.0 

5.3 

11.1 

AGO 

31.9 

14.3 

22.1 

17.4 

13.0 

18.8 

0.0 

10.5 

33.3 

AGC 

2.8 

21.4 

8.1 

8.7 

20.4 

6.3 

50.0 

10.5 

0.0 

Thr 

ACO 

32.8 

33.3 

28.4 

33.3 

0.0 

64.3 

5.9 

41.7 

30.0 

58 

ACC 

10.3 

11.1 

16.2 

16.7 

20.0 

0.0 

47.1 

16.7 

10.0 

ACA 

37.9 

55.6 

31.1 

50.0 

23.3 

14.3 

5.9 

41.7 

40.0 

ACG 

.9.0 

0.0 

24.3 

0.0 

IS. 7 

21.4 

41.2 

0.0 

20.0 

Tip 

7 

OGG 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

Tyr 

ton 

78.6 

81.8 

76.9 

88.2 

72.2 

77.3 

42.1 

73.9 

100.0 

28 

GAC 

21.4 

18.2 

23.1 

11.8 

27.8 

22.7 

57.9 

26.1 

0.0 

Val 

GOO 

27.9 

52.6 

25.9 

42.9 

36.2 

42.1 

9.1 

63.6 

33.3 

43 

GDC 

4.7 

0.0 

14.8 

0.0 

12.8 

5.3 

54.5 

9.1 

16.7 

GOA 

39.5 

42.1 

40.7 

50.0 

31.9 

31.6 

18.? 

18.2 

50.0 

GOG 

27.9 

5.3 

18.5 

7.1 

19.1 

21.1 

18.2 

9.1 

0.0 

kr 

85787 

21610 

133047 

65900 

60753 

31433 

24589 

29862 

13509 

•Within-group  percentage  codon  usage  calculated  with  HOLGINJR  software  package  (J.  R.  Lowe,  Fed.  Proc.  45:1582,  1386). 


bTV>0 


following  genes 
B.a.  PA 
0RF1 
Cry  Pro 
TetTox 
DTox 
EntB 
PToxS3 
ToxA 
CTxA 


B.a. 

B. t. 

C. t. 
C.d. 
S.a. 
B.p. 
E.C. 
tf.C. 


1  on  pXOl  plasmid 
(35) 


from  the  species  listed  were  examined. 

Bacillus  anthracis  protective  antigen  gene 
Bacillus  anthracis  hypothetical  protein  gene 
Bacillus  thurlagiensis  crystal  protein  gene 
Clostridium  tetaai  tetanus  toxin  gene  (6) 

Corynetacteriufi  diphtheriae  diphtheria  toxin  gene 
Staphylococcus  aureus  enterotoxiu  B  gene  (16) 

Bordetella  pertussis  pertussis  toxin  S3  binding  subunit  gene 
Escherichia  coli  heat-labile  enterotoxin  A  gene  (45) 

Vibrio  cholerae  cholera  toxin  alfa  subunit  gene  (28) 


(11) 


(24) 


total  number  of  specific  amino  acid  residues  deduced  from  protective  antigen  gene. 


